Classified the locations based on the type of surf break they are.
Point Break: As name implies these are primarily long shoulder breaks that are over reef or cobble and are typically surfed maneuver based. Typically not super heavy and instead are shoulders that go on for a long time.
Reef Break: These are heavy reef breaks which are typically surfed as barrel locations but sometimes might consist of maneuver surfing based on the conditions. Typically should be heavier waves with very critical sections ontop of reef.
Beach Break: These are sand bars where the waves primarily depend on the location but are usually very fast waves where breaks can be surfed as barrels or maneuvers but these spots usually see a mix of both based on how its breaking. Usually more forgiving and waves are way more inconsistent than reef and point type breaks.
Wave Pool: Brings out how much technical skill a surfer has as every wave is the exact same for each surfer so really only matters at how good you can surf it by eliminating all other ocean factors.
Here I am looking at what the standardized scores are like for each location while categorizing them by the break types.
Here’s graphs for some of the tops surfers
(Also I decided to only look at the graphs of championship tour events because ones where the location was only at a CS spot the surfers always had a high standardized score regardless of the break type or location simply because they were better than the competition)
Code
italo <- ct_2loc |>filter(athlete_name =='Italo Ferreira', tourId ==1)italo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Italo Ferreira",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Yago_Dora <- ct_2loc |>filter(athlete_name =='Yago Dora', tourId ==1)Yago_Dora |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Yago Dora",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Ethan_Ewing <- ct_2loc |>filter(athlete_name =='Ethan Ewing', tourId ==1)Ethan_Ewing |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Ethan Ewing",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Griffin_Colapinto <- ct_2loc |>filter(athlete_name =='Griffin Colapinto', tourId ==1)Griffin_Colapinto |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Griffin Colapinto",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Decided to explore lower performing CT surfers. I was curious if the effect of break type has more influence on these types of surfers as they probably did really well in some events but not others. Unlike the top surfers on the tour who probably perform higher regardless of the type of break.
Code
Jordy_Smith <- ct_2loc |>filter(athlete_name =='Jordy Smith', tourId ==1)Jordy_Smith |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jordy Smith",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Kanoa_Igarashi <- ct_2loc |>filter(athlete_name =='Kanoa Igarashi', tourId ==1)Kanoa_Igarashi |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Kanoa Igarashi",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Miguel_Pupo <- ct_2loc |>filter(athlete_name =='Miguel Pupo', tourId ==1)Miguel_Pupo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Miguel Pupo",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Seth_Moniz <- ct_2loc |>filter(athlete_name =='Seth Moniz', tourId ==1)Seth_Moniz |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Seth Moniz",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Joao_Chianca <- ct_cs |>filter(athlete_name =='Joao Chianca', tourId ==1)Joao_Chianca |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Joao Chianca",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Leonardo_Fioravanti <- ct_cs |>filter(athlete_name =='Leonardo Fioravanti', tourId ==1)Leonardo_Fioravanti |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Leonardo Fioravanti",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Filipe_Toledo <- ct_cs |>filter(athlete_name =='Filipe Toledo', tourId ==1)Filipe_Toledo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Filipe Toledo",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Jack_Robinson <- ct_cs |>filter(athlete_name =='Jack Robinson', tourId ==1)Jack_Robinson |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Jack Robinson",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Looked into Challenger Series but weren’t enough competitions and data to get results for effect of break type / location
Code
Jake_Marshall <- ct_cs |>filter(athlete_name =='Jake Marshall', tourId ==12)Jake_Marshall |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type for CS",subtitle ="Jake Marshall",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Crosby_Colapinto <- ct_cs |>filter(athlete_name =='Crosby Colapinto', tourId ==12)Crosby_Colapinto |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type for CS",subtitle ="Crosby Colapinto",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Looking into which surfers have competed at least 5 cs events and 3 ct events
13/87 of the surfers have significant p-values so their scores appear to be heavily influenced by the break type
Stances correlation
Code
ct |>filter(!is.na(stance)) |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = stance)) +geom_boxplot(position =position_dodge(), outlier.shape =NA) +labs(title ="Standardized Score by Location and Stance",x ="Location",y ="Standardized Score (score_std)",fill ="Stance" ) +theme_minimal() +coord_flip()
T-test showing effect of a surfers stance on score_std for each location. (underneath are just testing to see if normality and equal variance assumptions are met)
leveneTest(score_std ~ breakType, data = ct) |>kable()
Df
F value
Pr(>F)
group
3
0.3352487
0.7998617
4458
NA
NA
It does not appear that stance has much of an effect on how surfers perform besides at a very select few locations like Teahupoo. So I decided to not look further into this.
Same graph as earlier but overlayed surfers odds for each location where applicable to further the depth into what is shown
Code
italo <- ct_2loc |>filter(athlete_name =="Italo Ferreira", tourId ==1)odds_by_loc <- italo |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()italo_lab <- italo |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))italo_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Italo Ferreira — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Jack_Robinson <- ct_2loc |>filter(athlete_name =="Jack Robinson", tourId ==1)odds_by_loc <- Jack_Robinson |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Jack_Robinson_lab <- Jack_Robinson |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Jack_Robinson_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jack Robinson — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Filipe_Toledo <- ct_2loc |>filter(athlete_name =="Filipe Toledo", tourId ==1)odds_by_loc <- Filipe_Toledo |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Filipe_Toledo_lab <- Filipe_Toledo |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Filipe_Toledo_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Filipe Toledo — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Jordy_Smith <- ct_2loc |>filter(athlete_name =="Jordy Smith", tourId ==1)odds_by_loc <- Jordy_Smith |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Jordy_Smith_lab <- Jordy_Smith |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Jordy_Smith_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jordy Smith — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Joao_Chianca <- ct_2loc |>filter(athlete_name =="Joao Chianca", tourId ==1)odds_by_loc <- Joao_Chianca |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Joao_Chianca_lab <- Joao_Chianca |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Joao_Chianca_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Joao Chianca — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Code
Leonardo_Fioravanti <- ct_2loc |>filter(athlete_name =="Leonardo Fioravanti", tourId ==1)odds_by_loc <- Leonardo_Fioravanti |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Leonardo_Fioravanti_lab <- Leonardo_Fioravanti |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Leonardo_Fioravanti_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Leonardo Fioravanti — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()
Here I decided to group by surfers for an individual location to compare the standardized scores ordering by the odds of the surfers from bet365 for this past competition year.
My take: It seems like surfers with better odds have competed well at these locations in the past. Something to note is that there are some surfers like George Pittar who are really high compared to other surfers standardized scores but still having much worse odds than those around his score, but this is because this surfer does not usually perform well so this is actually a really good result because instead of having +6000 odds to win like he would usually be expected he instead has +3000 odds to win because of how he has done in the past. His past results are not good enough to put him up to around +600 odds like the surfers near him on the chart but this is due to him not being a very well ranked surfer on the tour so this is why he has a +3000 instead of +6000.
# A tibble: 5 × 3
athlete_name locationName odds
<chr> <chr> <dbl>
1 George Pittar Bells Beach 5000
2 George Pittar Gold Coast 6600
3 George Pittar Margaret River 3300
4 George Pittar Pipeline 15000
5 George Pittar Surf Abu Dhabi 799900
Final Visualizations/Analysis
Creating new variable in order to look into home nations for surfers to add that to the visualization:
These graphs show the distribution of a surfers standardized scores by location and break type where you can also see Bet365’s odds for the current competition season and also surfers home and away nations. Only included locations that surfers have competed in at least twice.
These graphs show the distribution of surfers standardized scores for a location containing the top 15 surfers by Bet365’s odds for the current competition season and also surfers home and away nations.
Some data that is incorrect in the original data set I was given; affecting the results:
A bunch of missing heat information from certain locations as seen from looking at past competitions. So I added tables below to show which surfers have multiple events to catch any inconsistencies.
---title: "AltSports Surf Location Analysis"author: "Owen Loughery"format: html: theme: light: cosmo dark: darkly toggle: true embed-resources: true code-tools: true toc: true toc-depth: 3 toc-expand: 1 number-sections: false code-fold: trueeditor: sourceexecute: error: true echo: true message: false warning: false---# Make sure to use Table of Contents to help navigate report!# Libraries used```{r}library(jsonlite)library(tidyverse)library(knitr)library(rlang)library(ggtext)```# Importing and cleaning dataImporting Data:```{r}trestles <-fromJSON("C:/Users/rdlou/class/Altsports/4784.json")surf <-read.csv("C:/Users/rdlou/Downloads/wsl_api_combined.csv/wsl_api_combined.csv")stances <-read.csv("C:/Users/rdlou/Downloads/wsl_api_surfers_combined.csv")```Adding Stances to dataset:```{r}stances <- stances |>select(athlete_id = athleteId, stance, nationAbbr) |>distinct(athlete_id, .keep_all =TRUE)surf <-left_join(surf, stances, by ="athlete_id")```Cleaning up the surf data set to only use completed heats and also ct/cs events:```{r}ct_cs <- surf |>filter(heat_status =='completed', tourId ==1| tourId ==12| tourId ==2)ct_cs$locationName <-as.factor(ct_cs$locationName)```Classified the locations based on the type of surf break they are.**Point Break**: As name implies these are primarily long shoulder breaks that are over reef or cobble and are typically surfed maneuver based. Typically not super heavy and instead are shoulders that go on for a long time.**Reef Break**: These are heavy reef breaks which are typically surfed as barrel locations but sometimes might consist of maneuver surfing based on the conditions. Typically should be heavier waves with very critical sections ontop of reef.**Beach Break**: These are sand bars where the waves primarily depend on the location but are usually very fast waves where breaks can be surfed as barrels or maneuvers but these spots usually see a mix of both based on how its breaking. Usually more forgiving and waves are way more inconsistent than reef and point type breaks.**Wave Pool**: Brings out how much technical skill a surfer has as every wave is the exact same for each surfer so really only matters at how good you can surf it by eliminating all other ocean factors.```{r}ct_cs <- ct_cs |>mutate(breakType =case_when( locationName %in%c("Gold Coast", "Jeffreys Bay", "Barra de la Cruz","Ribeira D'Ilhas", "Punta Roca", "Merewether Beach", "Lower Trestles", "Bells Beach") ~"Point Break", locationName %in%c("Uluwatu", "Keramas", "Teahupoʻo", "Banzai Pipeline", "Pipeline","Margaret River", "Ali'i Beach", "Rottnest Island", "Sunset Beach","G-Land", "Cloudbreak") ~"Reef Break", locationName %in%c("Saquarema", "Capbreton / Hossegor / Seignosse", "Supertubos","Manly Beach", "Huntington Beach", "Narrabeen", "Peniche Centre Region","Ballito", "Itauna", "North Narrabeen", "Newcastle") ~"Beach Break", locationName %in%c("Lemoore", "Surf Abu Dhabi") ~"Wavepool",TRUE~"other" )) |>group_by(eventId) |>mutate(score_std = (score -mean(score, na.rm =TRUE)) /sd(score, na.rm =TRUE) ) |>ungroup()ct_cs <- ct_cs |>mutate(locationName =if_else(locationName =="Banzai Pipeline", "Pipeline", locationName))ct <- ct_cs |>filter(tourId ==1)head(ct_cs) |>kable()```Merging odds into data set (Had chat gpt help do this as would have been extremely tedious otherwise since was not formatted well to merge)```{r}library(tidyverse)# ---- 1) Load the standardized odds table from CSV ----odds_std <-read_csv("C:/Users/rdlou/Downloads/surfer_odds_standardized.csv", show_col_types =FALSE) %>%mutate(athlete_name =str_squish(as.character(athlete_name)),locationName =str_squish(as.character(locationName)) )# ---- 2) Optional: Duplicate Rio Pro odds for both "Saquarema" and "Itauna" ----odds_std <-bind_rows( odds_std, odds_std %>%filter(locationName =="Saquarema") %>%mutate(locationName ="Itauna")) %>%distinct()# ---- 3) Helper function to merge with your results ----merge_odds <-function(results_df, odds_tbl = odds_std, join =c("left","inner","right","full")) { join <-match.arg(join) results_norm <- results_df %>%mutate(athlete_name =str_squish(as.character(athlete_name)),locationName =str_squish(as.character(locationName)) ) jfun <-switch( join,left = left_join,inner = inner_join,right = right_join,full = full_join )jfun( results_norm, odds_tbl %>%select(athlete_name, locationName, event, eventStartDate, odds, odds_format),by =c("athlete_name","locationName") )}# ---- Example usage ----# Assuming your CT results are in a dataframe called `ct`ct <-merge_odds(ct, join ="left")# View unmatched cases:# ct %>% anti_join(odds_std, by = c("athlete_name","locationName")) %>% count(locationName)ct <- ct |>select(-event, -eventStartDate.y) |>rename(eventStartDate = eventStartDate.x)ct <- ct |>mutate(odds =case_when( odds_format =="EU"& odds >=2~ (odds -1) *100, odds_format =="EU"& odds <2~-100/ (odds -1),TRUE~ odds ),odds_format =if_else(odds_format =="EU", "US", odds_format) )```Making a dataset that only includes surfers that have competed in a location at least twice:```{r}surfer_location_counts <- ct |>distinct(athlete_name, locationName, eventId) |>group_by(athlete_name, locationName) |>summarise(n_events =n(), .groups ="drop") |>filter(n_events >=2)ct_2loc <- ct |>inner_join(surfer_location_counts, by =c("athlete_name", "locationName"))```# Work process of getting to current graphsHere I am looking at what the standardized scores are like for each location while categorizing them by the break types.Here's graphs for some of the tops surfers(Also I decided to only look at the graphs of championship tour events because ones where the location was only at a CS spot the surfers always had a high standardized score regardless of the break type or location simply because they were better than the competition)```{r}italo <- ct_2loc |>filter(athlete_name =='Italo Ferreira', tourId ==1)italo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Italo Ferreira",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Yago_Dora <- ct_2loc |>filter(athlete_name =='Yago Dora', tourId ==1)Yago_Dora |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Yago Dora",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Ethan_Ewing <- ct_2loc |>filter(athlete_name =='Ethan Ewing', tourId ==1)Ethan_Ewing |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Ethan Ewing",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Griffin_Colapinto <- ct_2loc |>filter(athlete_name =='Griffin Colapinto', tourId ==1)Griffin_Colapinto |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Griffin Colapinto",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()```Decided to explore lower performing CT surfers. I was curious if the effect of break type has more influence on these types of surfers as they probably did really well in some events but not others. Unlike the top surfers on the tour who probably perform higher regardless of the type of break.```{r}Jordy_Smith <- ct_2loc |>filter(athlete_name =='Jordy Smith', tourId ==1)Jordy_Smith |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jordy Smith",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Kanoa_Igarashi <- ct_2loc |>filter(athlete_name =='Kanoa Igarashi', tourId ==1)Kanoa_Igarashi |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Kanoa Igarashi",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Miguel_Pupo <- ct_2loc |>filter(athlete_name =='Miguel Pupo', tourId ==1)Miguel_Pupo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Miguel Pupo",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Seth_Moniz <- ct_2loc |>filter(athlete_name =='Seth Moniz', tourId ==1)Seth_Moniz |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Seth Moniz",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Joao_Chianca <- ct_cs |>filter(athlete_name =='Joao Chianca', tourId ==1)Joao_Chianca |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Joao Chianca",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Leonardo_Fioravanti <- ct_cs |>filter(athlete_name =='Leonardo Fioravanti', tourId ==1)Leonardo_Fioravanti |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Leonardo Fioravanti",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Filipe_Toledo <- ct_cs |>filter(athlete_name =='Filipe Toledo', tourId ==1)Filipe_Toledo |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Filipe Toledo",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Jack_Robinson <- ct_cs |>filter(athlete_name =='Jack Robinson', tourId ==1)Jack_Robinson |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type",subtitle ="Jack Robinson",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()```Looked into Challenger Series but weren't enough competitions and data to get results for effect of break type / location```{r}Jake_Marshall <- ct_cs |>filter(athlete_name =='Jake Marshall', tourId ==12)Jake_Marshall |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type for CS",subtitle ="Jake Marshall",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Crosby_Colapinto <- ct_cs |>filter(athlete_name =='Crosby Colapinto', tourId ==12)Crosby_Colapinto |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores \n by Location and Break Type for CS",subtitle ="Crosby Colapinto",x ="Location",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()```Looking into which surfers have competed at least 5 cs events and 3 ct events```{r}ct_cs_years <- ct_cs %>%distinct(athlete_name, tourId, eventYear)surfer_counts <- ct_cs_years %>%group_by(athlete_name, tourId) %>%summarise(n_years =n(), .groups ="drop")surfer_wide <- surfer_counts %>% tidyr::pivot_wider(names_from = tourId, values_from = n_years, names_prefix ="tour_")dual_tour_surfer <- surfer_wide %>%filter(tour_1 >=3, tour_12 >=5)dual_tour_surfer |>select(athlete_name) |>kable()```Here I just did some testing out of curiosity```{r}library(dplyr)library(purrr)library(broom)surfer_pvals <- ct |>filter(!is.na(score_std), !is.na(breakType)) |>group_by(athlete_name) |>filter(n_distinct(breakType) >1) |>nest() |>mutate(model =map(data, ~lm(score_std ~ breakType, data = .x)),anova =map(model, ~anova(.x)) ) |>mutate(breakType_pval =map_dbl(anova, ~ .x$`Pr(>F)`[1]) )```Testing using ANOVA to see if standardized scores vary signif. based on break type for each individual surfer.```{r}surfer_pvals |>filter(breakType_pval <= .1) |>select(athlete_name, breakType_pval)```13/87 of the surfers have significant p-values so their scores appear to be heavily influenced by the break typeStances correlation```{r}ct |>filter(!is.na(stance)) |>ggplot(aes(x =reorder(locationName, score_std, FUN = median), y = score_std, fill = stance)) +geom_boxplot(position =position_dodge(), outlier.shape =NA) +labs(title ="Standardized Score by Location and Stance",x ="Location",y ="Standardized Score (score_std)",fill ="Stance" ) +theme_minimal() +coord_flip()```T-test showing effect of a surfers stance on score_std for each location. (underneath are just testing to see if normality and equal variance assumptions are met)```{r}stance_tests <- ct |>filter(!is.na(score_std), !is.na(stance)) |>group_by(locationName) |>filter(n_distinct(stance) ==2) |>summarise(p_value =tryCatch(t.test(score_std ~ stance)$p.value, error =function(e) NA),.groups ='drop' )stance_tests |>kable()ggplot(ct, aes(x = score_std)) +geom_histogram(bins =30) +facet_wrap(~ locationName)library(car)leveneTest(score_std ~ locationName, data = ct) |>kable()```T-test to see if there is a significant difference in standardized scores between goofy and regular foot for each type of break.```{r}surf_clean <- ct |>filter(!is.na(score_std), !is.na(stance), !is.na(breakType), stance =="Goofy"| stance =="Regular")results <- surf_clean |>group_by(breakType) |>do(tidy(t.test(score_std ~ stance, data = .)))results |>select(breakType, estimate, estimate1, estimate2, p.value, conf.low, conf.high) |>kable()ggplot(ct, aes(x = score_std)) +geom_histogram(bins =30) +facet_wrap(~ breakType)leveneTest(score_std ~ breakType, data = ct) |>kable()```It does not appear that stance has much of an effect on how surfers perform besides at a very select few locations like Teahupoo. So I decided to not look further into this.Same graph as earlier but overlayed surfers odds for each location where applicable to further the depth into what is shown```{r}italo <- ct_2loc |>filter(athlete_name =="Italo Ferreira", tourId ==1)odds_by_loc <- italo |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()italo_lab <- italo |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))italo_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Italo Ferreira — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Jack_Robinson <- ct_2loc |>filter(athlete_name =="Jack Robinson", tourId ==1)odds_by_loc <- Jack_Robinson |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Jack_Robinson_lab <- Jack_Robinson |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Jack_Robinson_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jack Robinson — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Filipe_Toledo <- ct_2loc |>filter(athlete_name =="Filipe Toledo", tourId ==1)odds_by_loc <- Filipe_Toledo |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Filipe_Toledo_lab <- Filipe_Toledo |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Filipe_Toledo_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Filipe Toledo — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Jordy_Smith <- ct_2loc |>filter(athlete_name =="Jordy Smith", tourId ==1)odds_by_loc <- Jordy_Smith |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Jordy_Smith_lab <- Jordy_Smith |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Jordy_Smith_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Jordy Smith — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Joao_Chianca <- ct_2loc |>filter(athlete_name =="Joao Chianca", tourId ==1)odds_by_loc <- Joao_Chianca |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Joao_Chianca_lab <- Joao_Chianca |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Joao_Chianca_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Joao Chianca — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()``````{r}Leonardo_Fioravanti <- ct_2loc |>filter(athlete_name =="Leonardo Fioravanti", tourId ==1)odds_by_loc <- Leonardo_Fioravanti |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup()Leonardo_Fioravanti_lab <- Leonardo_Fioravanti |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")))Leonardo_Fioravanti_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std, fill = breakType)) +geom_boxplot() +labs(title ="Distribution of Standardized Scores by \n Location and Break Type",subtitle ="Leonardo Fioravanti — odds shown in parentheses",x ="Location (2025 odds)",y ="score_std",fill ="Break Type" ) +theme_minimal() +coord_flip()```Here I decided to group by surfers for an individual location to compare the standardized scores ordering by the odds of the surfers from bet365 for this past competition year.```{r}top_odds_teahupoo <- ct |>filter(locationName =="Teahupoʻo") |>distinct(athlete_name, odds) |>group_by(athlete_name) |>slice(1) |>ungroup() |>arrange(odds) |>slice_head(n =15)ct_top_odds_teahupoo <- ct |>semi_join(top_odds_teahupoo, by ="athlete_name") |>filter(locationName =="Teahupoʻo")odds_labels <- top_odds_teahupoo |>mutate(athlete_name =factor(athlete_name))ggplot(ct_top_odds_teahupoo,aes(x =reorder(athlete_name, score_std, FUN = median),y = score_std, fill = athlete_name)) +geom_boxplot(show.legend =FALSE) +geom_text(data = odds_labels,aes(x = athlete_name,y =max(ct_top_odds_teahupoo$score_std) +0.05,label =paste0("Odds: ", odds)),inherit.aes =FALSE,color ="red", fontface ="bold", size =3) +labs(title ="Surfers w/ Top 15 Odds at Teahupoʻo — Standardized Scores & Odds",x ="Surfer",y ="Standardized Score" ) +theme_minimal() +coord_flip(clip ="off") +theme(plot.margin =margin(5, 50, 5, 5) )``````{r}top_odds <- ct |>filter(locationName =="Bells Beach") |>distinct(athlete_name, odds) |>group_by(athlete_name) |>slice(1) |>ungroup() |>arrange(odds) |>slice_head(n =15)ct_top_odds <- ct |>semi_join(top_odds, by ="athlete_name") |>filter(locationName =="Bells Beach")range_offset <-0.05*diff(range(ct_top_odds$score_std, na.rm =TRUE))odds_labels <- ct_top_odds |>group_by(athlete_name) |>summarise(whisker_top =max(score_std, na.rm =TRUE), .groups ="drop") |>left_join(top_odds, by ="athlete_name") |>mutate(y_pos = whisker_top + range_offset + .1,athlete_name =factor(athlete_name) )ggplot(ct_top_odds,aes(x =reorder(athlete_name, score_std, FUN = median),y = score_std, fill = athlete_name)) +geom_boxplot(show.legend =FALSE, outlier.shape =NA) +geom_text(data = odds_labels,aes(x = athlete_name, y = y_pos, label =paste0("Odds: ", odds)),inherit.aes =FALSE,color ="red", fontface ="bold", size =3) +labs(title ="Surfers w/ Top 15 Odds at Bells Beach — Standardized Scores & Odds",x ="Surfer",y ="Standardized Score" ) +theme_minimal() +coord_flip(clip ="off") +theme(plot.margin =margin(5, 60, 5, 5),plot.title =element_text(hjust =0.5)) ``````{r}top_odds <- ct |>filter(locationName =="Margaret River") |>distinct(athlete_name, odds) |>group_by(athlete_name) |>slice(1) |>ungroup() |>arrange(odds) |>slice_head(n =15)ct_top_odds <- ct |>semi_join(top_odds, by ="athlete_name") |>filter(locationName =="Margaret River")range_offset <-0.05*diff(range(ct_top_odds$score_std, na.rm =TRUE))odds_labels <- ct_top_odds |>group_by(athlete_name) |>summarise(whisker_top =max(score_std, na.rm =TRUE), .groups ="drop") |>left_join(top_odds, by ="athlete_name") |>mutate(y_pos = whisker_top + range_offset + .1,athlete_name =factor(athlete_name) )ggplot(ct_top_odds,aes(x =reorder(athlete_name, score_std, FUN = median),y = score_std, fill = athlete_name)) +geom_boxplot(show.legend =FALSE, outlier.shape =NA) +geom_text(data = odds_labels,aes(x = athlete_name, y = y_pos, label =paste0("Odds: ", odds)),inherit.aes =FALSE,color ="red", fontface ="bold", size =3) +labs(title ="Surfers w/ Top 15 Odds at Margaret River — Standardized Scores & Odds",x ="Surfer",y ="Standardized Score" ) +theme_minimal() +coord_flip(clip ="off") +theme(plot.margin =margin(5, 60, 5, 5),plot.title =element_text(hjust =0.5))```My take: It seems like surfers with better odds have competed well at these locations in the past. Something to note is that there are some surfers like George Pittar who are really high compared to other surfers standardized scores but still having much worse odds than those around his score, but this is because this surfer does not usually perform well so this is actually a really good result because instead of having +6000 odds to win like he would usually be expected he instead has +3000 odds to win because of how he has done in the past. His past results are not good enough to put him up to around +600 odds like the surfers near him on the chart but this is due to him not being a very well ranked surfer on the tour so this is why he has a +3000 instead of +6000.Here's his other odds for comparison.```{r}ct |>filter(athlete_name =="George Pittar") |>group_by(locationName) |>slice(1) |>ungroup() |>select(athlete_name, locationName, odds) |>filter(!is.na(odds))```# Final Visualizations/AnalysisCreating new variable in order to look into home nations for surfers to add that to the visualization:```{r}ct <- ct |>mutate(locationNation =case_when( locationName %in%c("Gold Coast", "Bells Beach", "Margaret River","Merewether Beach", "Narrabeen", "Rottnest Island") ~"AUS", locationName %in%c("Uluwatu", "Keramas", "G-Land") ~"INA", locationName =="Saquarema"| locationName =="Itauna"~"BRA", locationName =="Jeffreys Bay"~"RSA", locationName =="Teahupoʻo"~"PYF", locationName =="Lemoore"| locationName =="Lower Trestles"~"USA", locationName %in%c("Capbreton / Hossegor / Seignosse", "Supertubos","Peniche Centre Region") ~"POR", locationName =="Pipeline"| locationName =="Sunset Beach"~"HAW", locationName =="Barra de la Cruz"~"MEX", locationName =="Punta Roca"~"SLV", locationName =="Cloudbreak"~"FJI", locationName =="Surf Abu Dhabi"~"UAE",TRUE~NA_character_ ))ct <- ct |>mutate(home_break = nationAbbr == locationNation )ct <- ct |>mutate(nationAbbr =if_else(athlete_name =="Kauli Vaast", "PYF", nationAbbr))location_counts <- ct |>distinct(athlete_name, locationName, eventId) |>group_by(athlete_name, locationName) |>summarise(n_events =n(), .groups ="drop") |>filter(n_events >=2)ct_2_loc <- ct |>inner_join(location_counts, by =c("athlete_name", "locationName"))```## Graphs for analyzing individual surfersThese graphs show the distribution of a surfers standardized scores by location and break type where you can also see Bet365's odds for the current competition season and also surfers home and away nations. Only included locations that surfers have competed in at least twice.Here is the code used to create these graphs:```{r}surfer_graph <-function(surfer_name){ Surfer <- ct_2_loc |>filter(athlete_name == .env$surfer_name, tourId ==1) |>mutate(home_break = nationAbbr == locationNation) odds_by_loc <- Surfer |>distinct(locationName, odds) |>group_by(locationName) |>slice(1) |>ungroup() Surfer_lab <- Surfer |>left_join(odds_by_loc, by ="locationName", suffix =c("", "_loc")) |>mutate(locationLabel =ifelse(is.na(odds_loc), locationName,paste0(locationName, " (", odds_loc, ")")) ) Surfer_lab |>ggplot(aes(x =reorder(locationLabel, score_std, FUN = median),y = score_std,fill = breakType,color = home_break)) +geom_boxplot(outlier.shape =NA, size =0.7) +scale_color_manual(values =c(`TRUE`="green3", `FALSE`="black"),labels =c(`TRUE`="Home Nation", `FALSE`="Away Nation"),name ="Home/Away") +labs(title ="Distribution of Standardized Scores by Location and Break Type",subtitle =paste0(surfer_name, " — odds in parentheses"),x ="Location (2025 odds)",y ="Standardized Score",fill ="Break Type" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_markdown(hjust =0.5) ) +coord_flip()}surfer_table <-function(surfer_name) { Surfer <- ct_2_loc |>filter(athlete_name == .env$surfer_name, tourId ==1) |>mutate(home_break = nationAbbr == locationNation) counts_tbl <- Surfer |>group_by(locationName) |>summarise(events =n_distinct(eventId),heats =n(),breakType =first(breakType),.groups ="drop" ) |>arrange(desc(events), desc(heats), locationName) |>select(locationName, events, heats, breakType) |>rename("Location Name"= locationName, "# of Events Surfed"= events,"# of Heats Surfed"= heats,"Break Type"= breakType) counts_tbl |>kable()}```### Barron Mamiya```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Barron Mamiya")surfer_table("Barron Mamiya")```### Connor O'Leary```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Connor O'Leary")surfer_table("Connor O'Leary")```### Ethan Ewing```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Ethan Ewing")surfer_table("Ethan Ewing")```### Filipe Toledo```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Filipe Toledo")surfer_table("Filipe Toledo")```### Griffin Colapinto```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Griffin Colapinto")surfer_table("Griffin Colapinto")```### Imaikalani deVault```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Imaikalani deVault")surfer_table("Imaikalani deVault")```### Italo Ferreira```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Italo Ferreira")surfer_table("Italo Ferreira")```### Jack Robinson```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Jack Robinson")surfer_table("Jack Robinson")```### Jake Marshall```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Jake Marshall")surfer_table("Jake Marshall")```### Joao Chianca```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Joao Chianca")surfer_table("Joao Chianca")```### John John Florence```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("John John Florence")surfer_table("John John Florence")```### Jordy Smith```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Jordy Smith")surfer_table("Jordy Smith")```### Kanoa Igarashi ```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Kanoa Igarashi")surfer_table("Kanoa Igarashi")```### Leonardo Fioravanti```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Leonardo Fioravanti")surfer_table("Leonardo Fioravanti")```### Liam O'Brien```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Liam O'Brien")surfer_table("Liam O'Brien")```### Matthew McGillivray```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Matthew McGillivray")surfer_table("Matthew McGillivray")```### Miguel Pupo```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Miguel Pupo")surfer_table("Miguel Pupo")```### Rio Waida```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Rio Waida")surfer_table("Rio Waida")```### Ryan Callinan```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Ryan Callinan")surfer_table("Ryan Callinan")```### Seth Moniz```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Seth Moniz")surfer_table("Seth Moniz")```### Yago Dora```{r}#| echo: false#| fig.width: 9#| fig.height: 6surfer_graph("Yago Dora")surfer_table("Yago Dora")```## Graphs for analyzing individual locationsThese graphs show the distribution of surfers standardized scores for a location containing the top 15 surfers by Bet365's odds for the current competition season and also surfers home and away nations.Some data that is incorrect in the original data set I was given; affecting the results:A bunch of missing heat information from certain locations as seen from looking at past competitions. So I added tables below to show which surfers have multiple events to catch any inconsistencies.```{r}top_odds <-function(surf_break){top_odds <- ct |>filter(locationName == surf_break) |>distinct(athlete_name, odds) |>group_by(athlete_name) |>slice(1) |>ungroup() |>arrange(odds) |>slice_head(n =15)ct_top_odds <- ct |>semi_join(top_odds, by ="athlete_name") |>filter(locationName == surf_break)range_offset <-0.05*diff(range(ct_top_odds$score_std, na.rm =TRUE))odds_labels <- ct_top_odds |>group_by(athlete_name) |>summarise(whisker_top =max(score_std, na.rm =TRUE),home_break =first(home_break),.groups ="drop" ) |>left_join(top_odds, by ="athlete_name") |>mutate(y_pos = whisker_top + range_offset +0.1,athlete_name =factor(athlete_name),home_lab =factor(home_break,levels =c(TRUE, FALSE),labels =c("Home Country", "Away Country")) )ggplot(ct_top_odds,aes(x =reorder(athlete_name, score_std, FUN = median),y = score_std, fill = athlete_name)) +geom_boxplot(show.legend =FALSE, outlier.shape =NA) +geom_text(data = odds_labels,aes(x = athlete_name, y = y_pos,label =paste0("Odds: ", odds),color = home_lab),inherit.aes =FALSE,fontface ="bold", size =3 ) +scale_color_manual(values =c("Home Country"="green", "Away Country"="red")) +guides(color ="none") +labs(title =paste0("Surfers w/ Top 15 Odds at ", surf_break, " — Standardized Scores & Odds"),subtitle ="<span style='color:green'>■</span> Home Country <span style='color:red'>■</span> Away Country",x ="Surfer",y ="Standardized Score" ) +theme_minimal() +coord_flip(clip ="off") +theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_markdown(hjust =0.5),plot.margin =margin(5, 60, 5, 5) )}location_table <-function(surf_break) { top_odds <- ct |>filter(locationName == surf_break) |>distinct(athlete_name, odds) |>group_by(athlete_name) |>slice(1) |>ungroup() |>arrange(odds) |>slice_head(n =15) ct_top_odds <- ct |>semi_join(top_odds, by ="athlete_name") |>filter(locationName == surf_break) counts_tbl <- ct_top_odds |>group_by(athlete_name) |>summarise(events =n_distinct(eventId),heats =n(),.groups ="drop" ) |>arrange(desc(events), desc(heats), athlete_name) |>select(athlete_name, events, heats) |>rename("Surfer Name"= athlete_name,"# of Events Competed"= events,"# of Heats Competed"= heats) counts_tbl |>kable()}```### Bells Beach```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Bells Beach")location_table("Bells Beach")```### Gold Coast```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Gold Coast")location_table("Gold Coast")```### Itauna```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Itauna")location_table("Itauna")```### Lower Trestles```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Lower Trestles")location_table("Lower Trestles")```### Margaret River```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Margaret River")location_table("Margaret River")```### Pipeline```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Pipeline")location_table("Pipeline")```### Punta Roca```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Punta Roca")location_table("Punta Roca")```### Saquarema ```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Saquarema")location_table("Saquarema")```### Teahupoʻo```{r}#| echo: false#| fig.width: 9#| fig.height: 6top_odds("Teahupoʻo")location_table("Teahupoʻo")```